Nucleotide sequence encoding a 14 kDa protein from goat liver

ABSTRACT

A complete cDNA sequence (SEQ ID NO: 1) is described, which codes for a 14 kDa protein (SEQ ID NO: 2) obtainable from goat liver. This protein shows marked anti-neoplastic activity both in vitro and in vivo. The serum of animals immunized with this protein displays cytotoxic activity against human tumor cell lines. The cDNA sequence of the invention may be obtained by procedures involving, among other things, the use of degenerate oligonucleotides.

The present invention refers to a cDNA sequence encoding a 14 kDaprotein from goat liver.

WO 92/10197 discloses perchloric acid extracts of mammalian organs, in,particular of goat liver, consisting of at least three differentproteins and characterized by unusual pharmacological and immunologicalproperties.

More recently, in WO 96/02567, the partial aminoacid sequence of a 14kDa protein purified from the extracts disclosed in WO 92/10197 has beendescribed. This protein shows marked antineoplastic activity either invitro and in vivo and the serum of animals immunized with this proteindisplays cytotoxic activity against human tumor cell lines.

In 1993, Levy-Favatier et al. [Eur. J. Biochem. 212 (3), 665-673 ] havereported the cDNA sequence coding for a 23 kDa dimeric protein purifiedby 5% perchloric acid from rat liver and kidney; the correspondingaminoacid sequence shows an high degree of homology with the sequence ofthe 14 kDa protein disclosed in WO 96/02567. The cDNA sequence has beensubmitted to the EMBL Data Bank with Accession no. X70825.

In 1995, another cDNA sequence coding for a 14 kDa protein purified byperchloric acid from rat liver, with an high degree of homology withthat published by Levy-Favatier et al., has been submitted to the EMBLData Bank with Accession no. D49363. This sequence has been reported byOka et al. in J. Biol. Chem. (1995) 270 (50), 30060-30067.

Furthermore, in 1996, two novel mRNA sequences with an high degree ofhomology with the cDNA published by Levy-Favatier et al. and by Oka etal. have been submitted to the EMBL Data Bank. One of these (Accessionno: X95384) codes for a 14.5 kDa human protein and it has been recentlypublished by Schmiedeknecht et al. [Eur. J. Biochem. (1996) 242 (2),339-351]. The other sequence (Accession no: U50631) codes a "MusMusculus heat-responsive protein"; up to now, no extensive paperconcerning this sequence has been published.

We have now found a new cDNA sequence, encoding the entire 14 kDaprotein extracted by perchloric acid from goat liver and disclosed in WO96/02567. The complete nucleotide sequence is reported in SEQ ID NO: 1.

The cDNA sequence coding for the 14 kDa protein extracted by perchloricacid from goat liver and disclosed in WO 96/02567 (SEQ ID NO: 2), isuseful for the preparation of said protein or of muteins thereof bymeans of recombinant DNA methods or for diagnostic applications based onnucleotide probes.

The cDNA sequence of the invention has been obtained by the followingmethod: two mixtures of degenerate oligonucleotides have beensynthesized on the basis of the aminoacid sequence disclosed in WO96/02567. One mixture (named PG-1) consists of 2.048 oligo-20-mers,corresponding to aminoacid sequence extending from Met-1 to Gln-7. Theother mixture (named PG-2) consists of 192 oligo-20-mers correspondingto aminoacid sequence extending from Ala-46 to Xaa-52.

Using these oligonucleotide mixtures as primers and cDNAs obtained byreverse transcription of total RNA purified from goat liver as template,a PCR reaction has been performed. The reverse transcription reactionhas been carried out at 42° C. for 60 min. using oligo(dT)₁₅ as primer.After 35 cycles of amplification at the following conditions: 95° C. for2 min.-55° C. for 2 min.-72° C. for 1 min. in 2 mM MgCl₂, the DNAfragment amplified (155 bp) has been subcloned in the plasmid vectorpCRII and the insert of three different clones has been sequenced usingT7 and Sp6sequencing primers. The nucleotide sequence of this fragment(corresponding to region extending from nt. 215 to nt. 369 of theSequence Id n. 1) confirms the aminoacid sequence disclosed in WO96/02567 and identifies the Xaa-33 as Cys.

After the characterization of the complete aminoacid sequence of the 14kDa protein reported by Ceciliani et al. [FEBS Lett. (1996) 393,147-150] and following a procedure similar to that described above, thenucleotide sequence extending from nt. 215 to nt. 511 of the SEQ ID NO:1 has been determined. Briefly, degenerate oligonucleotides (named PG-9)have been synthesized. This mixture consists of 32.768 oligo-20-merscorresponding to aminoacid sequence extending from Pro-131 to Val-137.Using the oligonucleotide mixtures named PG-1 and PG-9 as primers andcDNAs obtained by reverse transcription of total RNA purified from goatliver as template, the PCR reaction has been performed in the sameprevious conditions. The amplified DNA fragment of 296 bp has beensubcloned in the plasmid vector pCRII and the insert has been sequencedusing T7 and Sp6 sequencing primers.

The nucleotide sequence extending towards poly(A) tail has been found by3'-rapid amplification cDNA end (3'-RACE) method suitably modified. Inthis case, the template used in the PCR reaction has been obtained byreverse transcription of total RNA purified from goat liver using asprimer an adaptor linked-oligo(dT)₁₇. The PCR primers were representedby the adaptor and by an oligo-30-mer (named PG-4) located on the regionextending from nt. 236 to nt. 265. Conditions of the PCR reaction werethe following: the reaction mixture containing 2 mM MgCl₂ and only theprimer named PG-4 has been subjected to 10 cycles of amplification oftwo steps: 95° C. for 45 sec.-72° C. for 3 min. Then, after the additionof the second PCR primer (the adaptor), 35 cycles of amplification havebeen performed at the following conditions: 95° C. for 45 sec.-52° C.for 1 min.-72° C. for 2 min. To isolate a more discrete DNA fragment, asuccessive nested-PCR has been performed using a downstream primer namedPG-5 extending from nt. 266 to nt. 289 of the SEQ ID NO: 1. Also thisDNA fragment (about 800 bp) has been subcloned in the plasmid vectorpCRII and sequenced with T7 and Sp6 sequencing primers. Its nucleotidesequence confirms the aminoacid sequence from Lys-56 towards C-terminalof the 14 kDa protein described by Ceciliani et al. [FEBS Lett. (1996)393, 147-150], except for the last aminoacid: Val-137 is substituted byLeu-137.

The nucleotide sequence extending towards the 5'-end of the cDNA hasbeen found by 5'-RACE method. This procedure consists in the synthesisof double-strand cDNAs from RNA poly(A) extracted from goat liver,ligation of these cDNAs with an adaptor and amplification by PCR using aprimer located on the adaptor sequence and the other primer located onthe cDNA sequence to be extended. In this specific case, the primerextending from nt. 290 to nt. 316 of the Sequence Id n. 1 has been used.The DNA fragments (about 300 bp) obtained have been subcloned in theplasmid vector pCRII and sequenced with T7 and Sp6 sequencing primers.

The entire cDNA sequence has been finally confirmed by direct DNAsequencing performed on two DNA fragment obtained by two different PCRreaction. As previously, DNA template was cDNAs obtained by reversetrascription of total RNA from goat liver using oligo (dT)₁₅ as primer,and the two primers for the amplification were located on the 5'-endextending from nt. 1 to nt. 26 of the Sequence Id n. 1 and on the 3'-endextending from nt. 984 to nt. 1007 of the SEQ ID NO: 1 of the cDNA.After 35 cycles of amplification at the following conditions: 95° C. for45 sec.-60° C. for 45 sec.-72° C. for 1 min. 30 sec. in 2 mM MgCl₂, theDNA fragment of 1007 bp was subjected to direct sequencing following thestandard procedure indicated in the kit's instructions.

    __________________________________________________________________________    #             SEQUENCE LISTING                                                - (1) GENERAL INFORMATION:                                                    -    (iii) NUMBER OF SEQUENCES: 2                                             - (2) INFORMATION FOR SEQ ID NO: 1:                                           -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 1017 base                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -     (vi) ORIGINAL SOURCE:                                                             (A) ORGANISM: Capra hir - #cus                                                (F) TISSUE TYPE: Liver                                              -     (ix) FEATURE:                                                                     (A) NAME/KEY: CDS                                                             (B) LOCATION:101..511                                               #1:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - CTTTGAAGCA GCGATTCTGG CTTCGGCTGG TCAGGCGACG CGAGCAGAAC CG - #TGTGCTGC         60                                                                          - GTACTTGTTT CCGAAGGGCA GCAAAGGAAA AGGGTTAGCC ATG TCG TC - #T TTG GTC          115                                                                          #        Met Ser Ser Leu Val                                                  #       5  1                                                                  - AGA AGG ATA ATC AGC ACG GCG AAA GCC CCC GC - #G GCC ATT GGT CCC TAC          163                                                                          Arg Arg Ile Ile Ser Thr Ala Lys Ala Pro Al - #a Ala Ile Gly Pro Tyr           #                 20                                                          - AGT CAG GCT GTG TTA GTC GAC AGG ACC ATT TA - #C ATT TCA GGA CAG CTA          211                                                                          Ser Gln Ala Val Leu Val Asp Arg Thr Ile Ty - #r Ile Ser Gly Gln Leu           #             35                                                              - GGT ATG GAC CCT GCA AGT GGA CAG CTT GTG CC - #A GGA GGG GTG GTA GAA          259                                                                          Gly Met Asp Pro Ala Ser Gly Gln Leu Val Pr - #o Gly Gly Val Val Glu           #         50                                                                  - GAG GCT AAA CAG GCT CTT ACA AAC ATA GGT GA - #A ATT CTG AAA GCA GCA          307                                                                          Glu Ala Lys Gln Ala Leu Thr Asn Ile Gly Gl - #u Ile Leu Lys Ala Ala           #     65                                                                      - GGC TGT GAC TTC ACG AAT GTG GTA AAA GCA AC - #G GTT TTG CTG GCT GAC          355                                                                          Gly Cys Asp Phe Thr Asn Val Val Lys Ala Th - #r Val Leu Leu Ala Asp           # 85                                                                          - ATA AAT GAC TTC AGT GCT GTC AAT GAT GTC TA - #C AAA CAA TAT TTC CAG          403                                                                          Ile Asn Asp Phe Ser Ala Val Asn Asp Val Ty - #r Lys Gln Tyr Phe Gln           #                100                                                          - AGT AGT TTT CCG GCG AGA GCT GCT TAC CAG GT - #T GCT GCT TTG CCC AAA          451                                                                          Ser Ser Phe Pro Ala Arg Ala Ala Tyr Gln Va - #l Ala Ala Leu Pro Lys           #           115                                                               - GGA GGC CGT GTT GAG ATC GAA GCA ATA GCT GT - #G CAA GGA CCT CTC ACG          499                                                                          Gly Gly Arg Val Glu Ile Glu Ala Ile Ala Va - #l Gln Gly Pro Leu Thr           #       130                                                                   - ACA GCA TCA CTC TAAGTGGGCC AAGTGTTATT TAGTCTGGAA AT - #TTAATAGT              551                                                                          Thr Ala Ser Leu                                                                   135                                                                       - ATTTTTAAAC TAATGGCTTA ATCCTTGTTG GAAAGTATTA AGGTTGAAAT AT - #CTGAAAAT        611                                                                          - ATTATGGAAA TACCATATAA TAAGGGAAAC GATATGAATT GAAGATTAAT GA - #TGAATCTA        671                                                                          - GTTACTAATA TTACAAATTA TACTTCTGTA ACACTTGTAT TGCTGGATGT GG - #GAAAACAG        731                                                                          - ACATGCCTTA CTGAGTTAAC TCAGAAGAAT AAAAGTAGAA GGAAATAACA TG - #TAGGAAAG        791                                                                          - ATGAGCTACT ATGCCTGAAA AGTAAGGAAA AGCACACCTA ATTCAACTAA AC - #CCTATTAA        851                                                                          - TTTAATGATG GGAAGTATTT TATTATGTCA GATATGTGAT TTTTACTTGA AT - #AAAACTAA        911                                                                          - AGCATTTAAA TTTGAATGGC AGAGATAAAG GAGAAGAAAC TGGACCAAAT TT - #TATATAGA        971                                                                          #               1017AAA TAAAATAGCA TGCAGATTTT CAAAAA                          - (2) INFORMATION FOR SEQ ID NO: 2:                                           -      (i) SEQUENCE CHARACTERISTICS:                                          #acids    (A) LENGTH: 137 amino                                                         (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: protein                                             #2:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - Met Ser Ser Leu Val Arg Arg Ile Ile Ser Th - #r Ala Lys Ala Pro Ala         #                 15                                                          - Ala Ile Gly Pro Tyr Ser Gln Ala Val Leu Va - #l Asp Arg Thr Ile Tyr         #             30                                                              - Ile Ser Gly Gln Leu Gly Met Asp Pro Ala Se - #r Gly Gln Leu Val Pro         #         45                                                                  - Gly Gly Val Val Glu Glu Ala Lys Gln Ala Le - #u Thr Asn Ile Gly Glu         #     60                                                                      - Ile Leu Lys Ala Ala Gly Cys Asp Phe Thr As - #n Val Val Lys Ala Thr         # 80                                                                          - Val Leu Leu Ala Asp Ile Asn Asp Phe Ser Al - #a Val Asn Asp Val Tyr         #                 95                                                          - Lys Gln Tyr Phe Gln Ser Ser Phe Pro Ala Ar - #g Ala Ala Tyr Gln Val         #           110                                                               - Ala Ala Leu Pro Lys Gly Gly Arg Val Glu Il - #e Glu Ala Ile Ala Val         #       125                                                                   - Gln Gly Pro Leu Thr Thr Ala Ser Leu                                         #   135                                                                       __________________________________________________________________________

What is claimed is:
 1. The cDNA sequence of SEQ ID NO:1.